EXTRACTING CONTENT IN REGIONAL WEB DOCUMENTS WITH TEXT VARIATIONS
نویسندگان
چکیده
منابع مشابه
Text Content Reliability Estimation in Web Documents: A New Proposal
This paper illustrates how a combination of information retrieval, machine learning, and NLP corpus annotation techniques was applied to a problem of text content reliability estimation in Web documents. Our proposal for text content reliability estimation is based on a model in which reliability is a similarity measure between the content of the documents and a knowledge corpus. The proposal i...
متن کاملExtracting Financial Information from Text Documents
The majority of electronic data today is in textual form. Financial data such as articles in the Wall Street Journal are written as texts. These electronic documents contain a wealth of information but require human interpretation. For financial analysis, rapid up-to-date information is critical. Most software tools currently require data which are better structured than text (such as data in r...
متن کاملExtracting Interlinear Glossed Text from LaTeX Documents
We present texigt, a command-line tool for the extraction of structured linguistic data from LTEX source documents, and a language resource that has been generated using this tool: a corpus of interlinear glossed text (IGT) extracted from open access books published by Language Science Press. Extracted examples are represented in a simple XML format that is easy to process and can be used to va...
متن کاملDetecting and Extracting Events from Text Documents
Events of various kinds are mentioned and discussed in text documents, whether they are books, news articles, blogs or microblog feeds. The paper starts by giving an overview of how events are treated in linguistics and philosophy. We follow this discussion by surveying how events and associated information are handled in computationally. In particular, we look at how textual documents can be m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Information Sciences and Computing
سال: 2014
ISSN: 0973-9092
DOI: 10.18000/ijisac.50144